Skip to content

fix(sdk): force_flush returns meaningful bool on MetricReader#5085

Open
ravitheja4531-cell wants to merge 7 commits into
open-telemetry:mainfrom
ravitheja4531-cell:fix/force-flush-return-meaningful-bool
Open

fix(sdk): force_flush returns meaningful bool on MetricReader#5085
ravitheja4531-cell wants to merge 7 commits into
open-telemetry:mainfrom
ravitheja4531-cell:fix/force-flush-return-meaningful-bool

Conversation

@ravitheja4531-cell

Copy link
Copy Markdown

Fixes #5020

Description

force_flush on MetricReader and PeriodicExportingMetricReader always
returned True regardless of whether the export succeeded or failed, violating
the OTel specification:

ForceFlush SHOULD provide a way to let the caller know whether it succeeded,
failed or timed out.

This PR threads the actual export result up through _receive_metrics
collectforce_flush so callers get a meaningful bool.

Also fixes a pre-existing bug where detach(token) in
PeriodicExportingMetricReader._receive_metrics was only called on the happy
path — moved to finally so it always runs.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Ran the full existing metrics test suite locally:

@ravitheja4531-cell

ravitheja4531-cell commented Apr 14, 2026

Copy link
Copy Markdown
Author

could someone from @jd take a look when you have a moment?
Happy to make any changes needed. Thanks!

@IndiTheo

Copy link
Copy Markdown

I personally like this pretty much. After it's merged, we'll have to open a new issue to indicate failure reason.

Will look at the code with more attention soon.

Btw, are you planning to add unit tests for the new behavior?

**kwargs,
) -> None:
"""Called by `MetricReader.collect` when it receives a batch of metrics"""
) -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> bool:
) -> bool | None:


@final
def collect(self, timeout_millis: float = 10_000) -> None:
def collect(self, timeout_millis: float = 10_000) -> Optional[bool]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def collect(self, timeout_millis: float = 10_000) -> Optional[bool]:
def collect(self, timeout_millis: float = 10_000) -> bool | None:

@github-project-automation github-project-automation Bot moved this to Reviewed PRs that need fixes in Python PR digest Apr 16, 2026
Comment on lines +611 to +613
collect_ok = super().force_flush(timeout_millis=timeout_millis)
exporter_ok = self._exporter.force_flush(timeout_millis=timeout_millis)
return collect_ok and exporter_ok

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we test this logic?
  2. Should we even call the exporter's force_flush if the general one fails?

@ravitheja4531-cell ravitheja4531-cell left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all review feedback:

Changed Optional[bool] → bool | None on collect and _receive_metrics (per @herin049)
Removed Optional from imports entirely
Fixed force_flush short-circuit: exporter.force_flush is now skipped if super().force_flush fails (per @Asquator)
Added 4 unit tests covering success, failure, short-circuit, and detach always running in finally

Ready for re-review. Thanks!

@ravitheja4531-cell ravitheja4531-cell force-pushed the fix/force-flush-return-meaningful-bool branch from 60da606 to 5680779 Compare April 20, 2026 12:11

@IndiTheo IndiTheo left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Some static checks fail.

@MikeGoldsmith MikeGoldsmith left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks @ravitheja4531-cell.

I've left a couple of small tweaks.

Comment thread opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/export/__init__.py Outdated
Comment thread CHANGELOG.md Outdated
@ravitheja4531-cell

Copy link
Copy Markdown
Author

Noted changed "must" to "should" in the docstring and updated the CHANGELOG to use the PR number. Thanks!
@herin049 @MikeGoldsmith

@ravitheja4531-cell

Copy link
Copy Markdown
Author

can you guys please approve it @herin049 @Asquator @MikeGoldsmith

Comment thread CHANGELOG.md Outdated
@ravitheja4531-cell

Copy link
Copy Markdown
Author

In the original code, detach(token) was called at the end of the method body but not inside a finally block. If an uncaught exception escaped the except Exception handler (e.g. a BaseException like KeyboardInterrupt), detach would never be called, leaking the context token. Moving it to finally ensures it always runs. Happy to remove the mention from the CHANGELOG if you'd prefer to keep the entry focused on the force_flush behavior.

@herin049

@IndiTheo

Copy link
Copy Markdown

The handlng of detach(token) can be extracted into a separate PR if the project philosophy demands it. On the other hand, it's such a small fix that leaving it here will probably do no harm.

All depends on maintainer preferences and how pedantic the library is about separating changes.

Anyway, if it's left here, the merge commit message should be more generic/include both changes.

@MikeGoldsmith MikeGoldsmith left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks good to me, pending @herin049's feedback.

@ravitheja4531-cell

ravitheja4531-cell commented Apr 24, 2026

Copy link
Copy Markdown
Author

Hi @MikeGoldsmith @Asquator please approve so that the checks run

@IndiTheo

Copy link
Copy Markdown

I already approved many times. We need a maintainer to approve the workflows.

@github-project-automation github-project-automation Bot moved this from Approved PRs to Reviewed PRs that need fixes in Python PR digest Apr 29, 2026

@ravitheja4531-cell ravitheja4531-cell left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved pmr.shutdown() inside the with patch(...) block so the mock doesn't leak into other tests. All 270 metrics tests now pass including test_cpu_time_callback and test_cpu_time_generator. Thanks @emdneto for catching this!

@ravitheja4531-cell ravitheja4531-cell requested a review from emdneto May 3, 2026 15:25
@IndiTheo

IndiTheo commented May 6, 2026

Copy link
Copy Markdown

It shows that the tests are still failing

@MikeGoldsmith

Copy link
Copy Markdown
Member

@ravitheja4531-cell please check CI. It's showing failures, likely because of the linter. Please run tox run precommit.

@ravitheja4531-cell ravitheja4531-cell force-pushed the fix/force-flush-return-meaningful-bool branch from 719f80c to 23c4e8e Compare May 12, 2026 08:05

@ravitheja4531-cell ravitheja4531-cell left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated test_detach_called_on_export_failure to use wraps=export_module.detach so the real detach function still executes while allowing call verification. All 3 tests pass locally including test_cpu_time_callback and test_cpu_time_generator. Also rebased onto latest main.

@ravitheja4531-cell ravitheja4531-cell left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved import opentelemetry.sdk.metrics._internal.export as _export_module to top level and used wraps=_export_module.detach in the patch so real detach still executes. All pre-commit checks pass locally.

@emdneto

emdneto commented May 12, 2026

Copy link
Copy Markdown
Member

Thanks for the PR!

Just a heads-up: we no longer update CHANGELOG.md directly. The changelog is now generated from changelog fragments using Towncrier.

Please add the appropriate changelog fragment for this change instead of editing CHANGELOG.md manually. You can find the instructions and expected format in CONTRIBUTING.md.

@ravitheja4531-cell ravitheja4531-cell left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @emdneto — I checked CONTRIBUTING.md on main and there's no mention of Towncrier or changelog fragments. Recent merged PRs (e.g. #5179, #5163) are all editing CHANGELOG.md directly. I've added the entry to CHANGELOG.md as per the existing convention. Please let me know if there's a different process I should follow.

@ravitheja4531-cell

Copy link
Copy Markdown
Author

Hi @emdneto: Can you with your review when you get a chance.

@ravitheja4531-cell

ravitheja4531-cell commented May 23, 2026

Copy link
Copy Markdown
Author

@emdneto: Did you get a chance to review/any other additional steps needed.

@IndiTheo

Copy link
Copy Markdown

Hello, is there a chance this can be merged in the near future?

@ravitheja4531-cell

Copy link
Copy Markdown
Author

Hello, is there a chance this can be merged in the near future?

Is I want to merge soon, please help me know if anything else needs to be done.

@ravitheja4531-cell

Copy link
Copy Markdown
Author

Hello, is there a chance this can be merged in the near future?

Is I want to merge soon, please help me know if anything else needs to be done.

@emdneto @IndiTheo : Can you help move forward.

@IndiTheo

Copy link
Copy Markdown

I see the feature itself as done. Maybe there are some caveats with the .md files indeed, I don't know much about that.
@MikeGoldsmith @herin049 how can we advance from here?

@ravitheja4531-cell

Copy link
Copy Markdown
Author

I see the feature itself as done. Maybe there are some caveats with the .md files indeed, I don't know much about that. @MikeGoldsmith @herin049 how can we advance from here?

Can someone help to move faster and merge this PR please @IndiTheo @herin049 @MikeGoldsmith

@ravitheja4531-cell

Copy link
Copy Markdown
Author

he feature itself as done. Maybe there are some caveats with the .md files indeed, I don't know much about that. @MikeGoldsmith @herin049 how can we advance from here?

Can someone help to move faster and merge this PR please @IndiTheo @herin049 @MikeGoldsmith

Hello Team, @MikeGoldsmith @herin049 how can we advance from here ? Need assistance to merge this asap.

@ravitheja4531-cell

Copy link
Copy Markdown
Author

he feature itself as done. Maybe there are some caveats with the .md files indeed, I don't know much about that. @MikeGoldsmith @herin049 how can we advance from here?

Can someone help to move faster and merge this PR please @IndiTheo @herin049 @MikeGoldsmith

Hello Team, @MikeGoldsmith @herin049 how can we advance from here ? Need assistance to merge this asap.

Hello @MikeGoldsmith @herin049: Any insights on this PR ? Requesting assistance to merge this sooner.

@ravitheja4531-cell

Copy link
Copy Markdown
Author

@emdneto @IndiTheo: Any insights on this PR ? Requesting assistance to merge this sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Reviewed PRs that need fixes

Development

Successfully merging this pull request may close these issues.

force_flush should provide a way to let the caller know whether it succeeded, failed or timed out.

6 participants